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Dr. Joseph H. Bredekamp 
Applied Information Systems Research Program 
NASA Office of Space Science, Code ST 
300 E Street. S.W. 

Washington, DC 20546-0001 

Dear Joe: 

This is the final report of our NASA AISRP grant entitled “High Performance Parallel Methods 
for Space Weather Simulations” (award number: NAG5-9406). The main thrust of the proposal 
was to achieve significant progress towards new high-performance methods which would greatly 
accelerate global MHD simulations and eventually make it possible to develop first-principles based 
space weather simualtions which run much faster than real time. I am pleased to report that with 
the help of this award we made major progress in this direction and developed the first parallel 
implicit global MHD code with adaptive mesh refinement. 

The main limitation of all earlier global space physics MHD codes was the explicit time stepping 
algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which 
essentially ensures that no information travels more than a cell size during a time step. This 
condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution 
(and consequently smaller computational cells) not only results in more computational cells, but 
also in smaller time steps. 

In global MHD simulations of space plasmas the CFL condition is controlled by two factors: (1) 
the smallest cell size in the simulation, and (2) the fast magnetosonic speed in high magnetic field, 


low plasma density regions. In a typical magnetosphere simulation with a smallest cell size of 
about 0.25 R E the CFL condition limits the time step to about ICE 2 s. This small step is primarily 
controlled by the high fast magnetosonic speed (due to the high Alfven speed) in the near-Earth 
region. 

There are several ways to increase the time step in an MHD simulation and thus decrease the time 
to solution: a “slow speed-of-light” version of the so-called “ Boris correction,” and implicit time 
stepping. The “Boris correction” corrects the fact that in non-relativistic MHD the Alfven speed 
(and the fast magnetosonic speed) can exceed the speed of light in low density, high magnetic field 
regions. 

The “slow speed-of-light Boris correction” is used to increase the time step in global MHD simula- 
tions of the magnetosphere by artificially reducing the value of the speed of light. Since the CFL 
condition is controlled by the fastest wave speed in the simulation domain (which usually happens 
in regions where the computational cells are the smallest), one can increase the time step by a 
factor of 10 to 100 by limiting the speed of light to c/100 — c/500. This is a fairly efficient way of 
code speedup which is also easy to implement. However, this speed-up comes at a cost of accuracy 
in the temporal evolution, and the more the Alfven speed is limited the more the fast physics is 
compromised. In practice, the benefit seems to outweigh the penalty up to a point. However, the 
penalty also gives us a reason to go looking for an alternative. 

An alternative method is to use implicit methods in the code. This is the method we use to speed 
up our adaptive MHD code BATS-R-US. As it will be described below, we implemented a parallel 
implicit time stepping algorithm and we achieved a factor of 100 — 1,000 increase in the MHD 
timestep. The implicit time stepping was achieved with considerable computational overhead, but 
we still gained a factor of 10 — 100 code speedup from this improvement. 

We combined explicit and implicit time stepping in our magnetosphere simulations. Magnetosphere 
simulations usually include large volumes where the Alfven speed is quite low (tens of krn/s), where 
the local CFL number would allow large explicit time steps (tens of seconds to several minutes). 
In these regions implicit time stepping is a waste of computational resources. Since our parallel 
implicit technique is fundamentally block based, we only treat those blocks implicitly where the 
CFL condition would limit the explicit time step to less than ~ 10 s. Needless to say, this combined 
explicit-implicit time stepping represents more computational challenges (such as separate balancing 
of explicit and implicit blocks), but the gain is so high that it is worth the price. 

With our objective in mind, a parallel implicit Newton-Krylov-Schwarz (NKS) method has been 
developed and implemented for improving the performance and efficiency of the upwind finite- 
volume scheme for solving the equations of MHD. We are using a high-resolution upwind treatment 
of the convective fluxes, and a central treatment of the viscous fluxes. All of this is carried out using 
the block-adaptive data structure used in our MHD code. Newton-Krylov methods are iterative 
techniques that have proven to be very effective in the solution of sparsely discretized nonlinear 



PDEs. They are based on inexact Newton iteration and a Krylov method, such as GMRES, and 
can be applied in a Jacobian-free manner. In the last ten years, these approaches have gained wide 
acceptance and have been applied to the solution of many problems in science and engineering. 

Newton-Krylov methods require preconditioning for scalability in the algorithmic, or convergence 
rate, sense. One successful approach to the high-performance parallelization of these methods that 
is data-placement consistent with the best domain decomposition parallelization of the Newton- 
Krylov method is additive Schwarz preconditioning. The action of the Schwarz preconditioner is 
carried out independently on subdomain problems. This approach appears to be well suited to fully 
exploiting the potential of distributed-memory multi-processor machines. Keyes and co-researchers 
have achieved some very encouraging first results in parallel implementations of NKS algorithms. 
In general, good parallel efficiency was achieved up to 6144 processors (ASCI Red). Specifically, 
the parallel efficiency between 128 and 3072 processors was 70%. 

The Schwarz preconditioning with the block-based data structure and Cartesian mesh domain 
decomposition procedure is very appropriate for high performance MHD codes. Use of additive 
overlapping Schwarz preconditioning in conjunction with a Newton-Krylov method for the parallel 
implicit solution of the ideal and resistive MHD equations on block-based adaptive Cartesian mesh 
are implemented in several stages. Initially, an inexact matrix-free Newton’s method is adopted, 
GMRES is used as the Krylov subspace method, and the effects of granularity of the subdomains, 
domain overlap, and subdomain solvers and preconditioners on the performance of the method have 
been investigated. This way we developed an effective parallel implicit NKS solver for the MHD 
equations while still achieving both high parallel performance and good scalability. Attention was 
be given to both parallel performance issues, such as load balancing and message passing, and serial 
performance issues, such as memory and cache usage and inner loop optimizations. Portability of 
the implicit NKS solvers across different hardware platforms has also been ensured. 

Very truly yours, 


Tarnas I. Gombosi 
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